Skip to main content

View Post [edit]

Poster: Project Ormin Date: Dec 23, 2016 4:46pm
Forum: opensource Subject: Workaround for non-standard alphabet not recognised by OCR?

I've just uploaded my first test item to the Internet Archive at https://archive.org/details/pride-and-prejudice-shavian-alphabet-edition . While the language is English, the script is Shavian, which for obvious reasons is not recognised by OCR.

For this reason, I have uploaded original files created by me for PDF, TXT, EPUB and MOBI. I'm not sure I can generate original ...DJVU.TXT or ...ABYY.GZ files to upload. These derived files are therefore full of non-sensical text. I'd like to put the item in the Open Library eventually and having some formats full of nonsense text is obviously not ideal.

Are there any suggestions on how to work around this? I understand I can neither suppress nor edit the problem formats, but would welcome suggestions.

Finally, is there a way to convert an item from a test item to a permanent item without re-uploading everything and using server resources again? I started with a test file out of caution as it was my first, but am happy with how it went...

Many thanks

Project Ormin

This post was modified by Project Ormin on 2016-12-24 00:45:10

This post was modified by Project Ormin on 2016-12-24 00:45:33

This post was modified by Project Ormin on 2016-12-24 00:46:06